naive method
Teaching Old Tokenizers New Words: Efficient Tokenizer Adaptation for Pre-trained Models
Purason, Taido, Chizhov, Pavel, Yamshchikov, Ivan P., Fishel, Mark
Tokenizer adaptation plays an important role in transferring pre-trained language models to new domains or languages. In this work, we address two complementary aspects of this process: vocabulary extension and pruning. The common approach to extension trains a new tokenizer on domain-specific text and appends the tokens that do not overlap with the existing vocabulary, which often results in many tokens that are unreachable or never used. We propose continued BPE training, which adapts a pre-trained tokenizer by continuing the BPE merge learning process on new data. Experiments across multiple languages and model families show that this approach improves tokenization efficiency and leads to better utilization of added vocabulary. We also introduce leaf-based vocabulary pruning, which removes redundant tokens while preserving model quality. Together, these methods provide practical tools for controlled vocabulary modification, which we release as an open-source package.
ACS: An interactive framework for conformal selection
Gui, Yu, Jin, Ying, Nair, Yash, Ren, Zhimei
This paper presents adaptive conformal selection (ACS), an interactive framework for model-free selection with guaranteed error control. Building on conformal selection (Jin and Candès, 2023b), ACS generalizes the approach to support human-in-the-loop adaptive data analysis. Under the ACS framework, we can partially reuse the data to boost the selection power, make decisions on the fly while exploring the data, and incorporate new information or preferences as they arise. The key to ACS is a carefully designed principle that controls the information available for decision making, allowing the data analyst to explore the data adaptively while maintaining rigorous control of the false discovery rate (FDR). Based on the ACS framework, we provide concrete selection algorithms for various goals, including model update/selection, diversified selection, and incorporating newly available labeled data. The effectiveness of ACS is demonstrated through extensive numerical simulations and real-data applications in large language model (LLM) deployment and drug discovery.
Reviews: Robust Principal Component Analysis with Adaptive Neighbors
Update: Thanks for the feedback and I have read them. Yet I don't think it has convinced me to change my decision. For Q2, if the framework is general, the authors should have extended it more than one case. Otherwise, the authors should focus on PCA instead of claiming the framework to be general. For Q3 and Q4, I think the discussion on how to choose k and d is not sufficient in the paper.
Maximizing the Impact of Deep Learning on Subseasonal-to-Seasonal Climate Forecasting: The Essential Role of Optimization
Guo, Yizhen, Zhou, Tian, Jiang, Wanyi, Wu, Bo, Sun, Liang, Jin, Rong
Weather and climate forecasting is vital for sectors such as agriculture and disaster management. Although numerical weather prediction (NWP) systems have advanced, forecasting at the subseasonal-to-seasonal (S2S) scale, spanning 2 to 6 weeks, remains challenging due to the chaotic and sparse atmospheric signals at this interval. Even state-of-the-art deep learning models struggle to outperform simple climatology models in this domain. This paper identifies that optimization, instead of network structure, could be the root cause of this performance gap, and then we develop a novel multi-stage optimization strategy to close the gap. Extensive empirical studies demonstrate that our multi-stage optimization approach significantly improves key skill metrics, PCC and TCC, while utilizing the same backbone structure, surpassing the state-of-the-art NWP systems (ECMWF-S2S) by over \textbf{19-91\%}. Our research contests the recent study that direct forecasting outperforms rolling forecasting for S2S tasks. Through theoretical analysis, we propose that the underperformance of rolling forecasting may arise from the accumulation of Jacobian matrix products during training. Our multi-stage framework can be viewed as a form of teacher forcing to address this issue. Code is available at \url{https://anonymous.4open.science/r/Baguan-S2S-23E7/}
Reviews: Learning with Feature Evolvable Streams
This paper formalizes a new problem setting, Feature Evolvable Streaming Learning. Sensors or other devices to extract feature values have the limited lifespans; therefore, these devices have been periodically replaced and the associated feature space changes. This learning paradigm prepares the overlapping period to adapt to the new feature space. In this overlapping period, learning algorithms receive features from both the old devices and the new devices simultaneously to capture the relationship between two feature spaces. This paper develops two learning algorithms to efficiently use previous experiences extracted from old training data to train/predict in the new feature space: 1) the weighted combination based predictor ensemble method, 2) the dynamic classifier selection.
Efficient Privacy-Preserving KAN Inference Using Homomorphic Encryption
Lai, Zhizheng, Zhou, Yufei, Zheng, Peijia, Chen, Lin
The recently proposed Kolmogorov-Arnold Networks (KANs) offer enhanced interpretability and greater model expressiveness. However, KANs also present challenges related to privacy leakage during inference. Homomorphic encryption (HE) facilitates privacy-preserving inference for deep learning models, enabling resource-limited users to benefit from deep learning services while ensuring data security. Yet, the complex structure of KANs, incorporating nonlinear elements like the SiLU activation function and B-spline functions, renders existing privacy-preserving inference techniques inadequate. To address this issue, we propose an accurate and efficient privacy-preserving inference scheme tailored for KANs. Our approach introduces a task-specific polynomial approximation for the SiLU activation function, dynamically adjusting the approximation range to ensure high accuracy on real-world datasets. Additionally, we develop an efficient method for computing B-spline functions within the HE domain, leveraging techniques such as repeat packing, lazy combination, and comparison functions. We evaluate the effectiveness of our privacy-preserving KAN inference scheme on both symbolic formula evaluation and image classification. The experimental results show that our model achieves accuracy comparable to plaintext KANs across various datasets and outperforms plaintext MLPs. Additionally, on the CIFAR-10 dataset, our inference latency achieves over 7 times speedup compared to the naive method.
Assisted Path Planning for a UGV-UAV Team Through a Stochastic Network
Bhadoriya, Abhay Singh, Rathinam, Sivakumar, Darbha, Swaroop, Casbeer, David W., Manyam, Satyanarayana G.
In this article, we consider a multi-agent path planning problem in a stochastic environment. The environment, which can be an urban road network, is represented by a graph where the travel time for selected road segments (impeded edges) is a random variable because of traffic congestion. An unmanned ground vehicle (UGV) wishes to travel from a starting location to a destination while minimizing the arrival time at the destination. UGV can traverse through an impeded edge but the true travel time is only realized at the end of that edge. This implies that the UGV can potentially get stuck in an impeded edge with high travel time. A support vehicle, such as an unmanned aerial vehicle (UAV) is simultaneously deployed from its starting position to assist the UGV by inspecting and realizing the true cost of impeded edges. With the updated information from UAV, UGV can efficiently reroute its path to the destination. The UGV does not wait at any time until it reaches the destination. The UAV is permitted to terminate its path at any vertex. The goal is then to develop an online algorithm to determine efficient paths for the UGV and the UAV based on the current information so that the UGV reaches the destination in minimum time. We refer to this problem as Stochastic Assisted Path Planning (SAPP). We present Dynamic $k$-Shortest Path Planning (D*KSPP) algorithm for the UGV planning and Rural Postman Problem (RPP) formulation for the UAV planning. Due to the scalability challenges of RPP, we also present a heuristic based Priority Assignment Algorithm (PAA) for the UAV planning. Computational results are presented to corroborate the effectiveness of the proposed algorithm to solve SAPP.
Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories
Lan, Li-Cheng, Zhang, Huan, Hsieh, Cho-Jui
In this paper, we define, evaluate, and improve the "relay-generalization" performance of reinforcement learning (RL) agents on the out-of-distribution "controllable" states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. For example, a self-driving system should be able to take over the control from humans in the middle of driving and must continue to drive the car safely. To practically evaluate this type of generalization, we start the test agent from the middle of other independently well-trained stranger agents' trajectories. With extensive experimental evaluation, we show the prevalence of generalization failure on controllable states from stranger agents. For example, in the Humanoid environment, we observed that a well-trained Proximal Policy Optimization (PPO) agent, with only 3.9% failure rate during regular testing, failed on 81.6% of the states generated by well-trained stranger PPO agents. To improve "relay generalization," we propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training. After applying STA to the Soft Actor Critic's (SAC) training procedure, we reduced the failure rate of SAC under relay-evaluation by more than three times in most settings without impacting agent performance and increasing the needed number of environment interactions. Our code is available at https://github.com/lan-lc/STA. Generalization is critical for deploying reinforcement learning (RL) agents into real-world applications. A well-trained RL agent that can achieve high rewards under restricted settings may not be able to handle the enormous state space and complex environment variations in the real world. There are many different aspects regarding the generalization of RL agents.
A Knowledge Distillation-Based Backdoor Attack in Federated Learning
Wang, Yifan, Fan, Wei, Yang, Keke, Alhusaini, Naji, Li, Jing
Federated Learning (FL) is a novel framework of decentralized machine learning. Due to the decentralized feature of FL, it is vulnerable to adversarial attacks in the training procedure, e.g. , backdoor attacks. A backdoor attack aims to inject a backdoor into the machine learning model such that the model will make arbitrarily incorrect behavior on the test sample with some specific backdoor trigger. Even though a range of backdoor attack methods of FL has been introduced, there are also methods defending against them. Many of the defending methods utilize the abnormal characteristics of the models with backdoor or the difference between the models with backdoor and the regular models. To bypass these defenses, we need to reduce the difference and the abnormal characteristics. We find a source of such abnormality is that backdoor attack would directly flip the label of data when poisoning the data. However, current studies of the backdoor attack in FL are not mainly focus on reducing the difference between the models with backdoor and the regular models. In this paper, we propose Adversarial Knowledge Distillation(ADVKD), a method combine knowledge distillation with backdoor attack in FL. With knowledge distillation, we can reduce the abnormal characteristics in model result from the label flipping, thus the model can bypass the defenses. Compared to current methods, we show that ADVKD can not only reach a higher attack success rate, but also successfully bypass the defenses when other methods fails. To further explore the performance of ADVKD, we test how the parameters affect the performance of ADVKD under different scenarios. According to the experiment result, we summarize how to adjust the parameter for better performance under different scenarios. We also use several methods to visualize the effect of different attack and explain the effectiveness of ADVKD.
Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE
Zhuang, Juntang, Dvornek, Nicha, Li, Xiaoxiao, Tatikonda, Sekhar, Papademetris, Xenophon, Duncan, James
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method directly back-propagates through ODE solvers, but suffers from a redundantly deep computation graph when searching for the optimal stepsize. We propose the Adaptive Checkpoint Adjoint (ACA) method: in automatic differentiation, ACA applies a trajectory checkpoint strategy which records the forward-mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive solvers. On image classification tasks, compared with the adjoint and naive method, ACA achieves half the error rate in half the training time; NODE trained with ACA outperforms ResNet in both accuracy and test-retest reliability. On time-series modeling, ACA outperforms competing methods. Finally, in an example of the three-body problem, we show NODE with ACA can incorporate physical knowledge to achieve better accuracy. We provide the PyTorch implementation of ACA: \url{https://github.com/juntang-zhuang/torch-ACA}.